<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
<meta name="generator" content="AsciiDoc 8.6.3" />
<title>Psychoacoustic Models in TwoLAME</title>
<link rel="stylesheet" href="./twolame.css" type="text/css" />
<script type="text/javascript">
/*<![CDATA[*/
window.onload = function(){asciidoc.footnotes();}
/*]]>*/
</script>
<script type="text/javascript" src="./asciidoc-xhtml11.js"></script>
</head>
<body class="article">
<div id="header">
<h1>Psychoacoustic Models in TwoLAME</h1>
<span id="revnumber">version 0.3.13</span>
</div>
<div id="content">
<div class="sect1">
<h2 id="_introduction">Introduction</h2>
<div class="sectionbody">
<div class="paragraph"><p>In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which
are the sonically important parts of the waveform that is being encoded.  The PAM
looks for loud sounds which may mask soft sounds, noise which may affect the level
of sounds nearby, sounds which are too soft for us to hear and should be ignored
and so on.  The information from the PAM is used to determine which parts of the
spectrum should get more bits and thus be encoded at greater quality - and which
parts are inaudible/unimportant and should thus get fewer bits.</p></div>
<div class="paragraph"><p>In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes
a <em>frame</em>. For each frame the PAM outputs just <strong>32</strong> values
(The values are the Signal to Masking Ratio [SMR] in that subband). This is important!
There are only 32 values to determine how to alloctate bits for 1152 samples - this
is a pretty coarse technique.</p></div>
<div class="paragraph"><p>The different PAMs listed below use different techniques to decide on these 32
values. Some models are better than others - meaning that the 32 values chosen
are pretty good at spreading the bits where they should go.  Even with a really
bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time.
All of these models have strengths and weaknesses.  The model <em>you</em> end up using
will be the one that produces the best sound for your ears, for your audio.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_psychoacoustic_model_1">Psychoacoustic Model -1</h2>
<div class="sectionbody">
<div class="paragraph"><p>This PAM doesn&#8217;t actually look at the samples being encoded to decide upon the
output values.  There is simply a set of 32 default values which are used,
regardless of input.</p></div>
<div class="paragraph"><p><strong>Pros</strong>: Faaaast. Low complexity. Surprisingly good.
"Surprising" in that the other PAMs go to the effort of calculating FFTs
and subbands and masking, and this one does absolutely <strong>nothing</strong>.
Zip. Nada. Diddly Squat. This model might be the best example of why
it is hard to make a good model - if having no computations sounds OK,
how do you improve on it?</p></div>
<div class="paragraph"><p><strong>Cons</strong>: Absolutely no attempt to consider any of the masking effects that
would help the audio sound better.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_psychoacoustic_model_0">Psychoacoustic Model 0</h2>
<div class="sectionbody">
<div class="paragraph"><p>This PAM looks at the sizes of the <em>scalefactors</em> for the audio and combines
it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.</p></div>
<div class="paragraph"><p><strong>Pros</strong>: Faaast. Low complexity.</p></div>
<div class="paragraph"><p><strong>Cons</strong>: This model has absolutely no mathematical basis and does not use
any perceptual model of hearing.  It simply juggles some of the numbers of
the input sound to determine the values. Feel free to hack the daylights out
of this PAM - add multipliers, constants, log-tables <strong>anything</strong>. Tweak it until
you begin to like the sound.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_psychoacoustic_model_1_and_2">Psychoacoustic Model 1 and 2</h2>
<div class="sectionbody">
<div class="paragraph"><p>These PAMs are from the ISO standard. Just because they are the standard,
doesn&#8217;t mean that they are any good. Look at LAME which basically threw out
the MP3 standard psycho models and made their own (GPSYCHO).</p></div>
<div class="paragraph"><p><strong>Pros</strong>: A reference for future PAMs</p></div>
<div class="paragraph"><p><strong>Cons</strong>: Terrible ISO code, buggy tables, poor documentation.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_psychoacoustic_model_3">Psychoacoustic Model 3</h2>
<div class="sectionbody">
<div class="paragraph"><p>A re-implementation of psychoacoustic model 1.  ISO11172 was used as the guide
for re-writing this PAM from the ground up.</p></div>
<div class="paragraph"><p><strong>Pros</strong>: No more obscure tables of values from the ISO code. Hopefully a good
base to work upon for tweaking PAMs</p></div>
<div class="paragraph"><p><strong>Cons</strong>: At the moment, doesn&#8217;t really sound any better than PAM1</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_psychoacoustic_model_4">Psychoacoustic Model 4</h2>
<div class="sectionbody">
<div class="paragraph"><p>A cleaned up version of PAM2.</p></div>
<div class="paragraph"><p><strong>Pros</strong>: Faster than PAM2. No more obscure tables of values from the ISO
standard. Hopefully a good base to work from for improving the PAMs</p></div>
<div class="paragraph"><p><strong>Cons</strong>: Still has the same "warbling"/"Davros" problems as PAM2.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_future_psychoacoustic_models">Future psychoacoustic models</h2>
<div class="sectionbody">
<div class="paragraph"><p>There&#8217;s a heap that could be done. Unfortunately, I&#8217;ve got a set of tin
ears, crappy speakers and a noisy computer room.  If you&#8217;ve got the
capability to do proper PAM testing then please feel free to do so.
Otherwise, I&#8217;ll just keep plodding along with new ideas as they
arise, such as:</p></div>
<div class="ulist"><ul>
<li>
<p>
Temporal masking (there&#8217;s no pre-echo or anything in TwoLAME)
</p>
</li>
<li>
<p>
Left Right Masking
</p>
</li>
<li>
<p>
A PAM that&#8217;s fully tuneable from the command line?
</p>
</li>
<li>
<p>
Graphical output of SMR values etc. Would allow better debugging of PAMs
</p>
</li>
<li>
<p>
Re-sampling routines
</p>
</li>
<li>
<p>
Low/High pass filtering
</p>
</li>
</ul></div>
</div>
</div>
</div>
<div id="footnotes"><hr /></div>
<div id="footer">
<div id="footer-text">
Version 0.3.13<br />
Last updated 2011-01-01 19:15:01 GMT
</div>
</div>
</body>
</html>
